Goto

Collaborating Authors

 maximin strategy





Efficient Minimax Strategies for Square Loss Games

Neural Information Processing Systems

We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance.


Safe Equilibrium

Ganzfried, Sam

arXiv.org Artificial Intelligence

In designing a strategy for a multiagent interaction an agent must balance between the assumption that opponents are behaving rationally with the risks that may occur if opponents behave irrationally. Most classic game-theoretic solution concepts, such as Nash equilibrium (NE), assume that all players are behaving rationally (and that this fact is common knowledge). On the other hand, a maximin strategy plays a strategy that has the largest worst-case guaranteed expected payoff; this limits the potential downside against a worstcase and potentially irrational opponent, but can also cause us to achieve significantly lower payoff against rational opponents. In two-player zero-sum games, Nash equilibrium and maximin strategies are equivalent (by the minimax theorem), and these two goals are completely aligned. But in non-zero-sum games and games with more than two players, this is not the case. In these games we can potentially obtain arbitrarily low payoff by following a Nash equilibrium strategy, but if we follow a maximin strategy will likely be playing far too conservatively. While the assumption that opponents are exhibiting a degree of rationality, as well as the desire to limit worst-case performance in the case of irrational opponents, are both desirable, neither the Nash equilibrium nor maximin solution concept is definitively compelling on its own. We propose a new solution concept that balances between these two extremes.


Efficient Minimax Strategies for Square Loss Games

Koolen, Wouter M., Malek, Alan, Bartlett, Peter L.

Neural Information Processing Systems

We consider online prediction problems where the loss between the prediction and the outcome is measured by the squared Euclidean distance and its generalization, the squared Mahalanobis distance. We derive the minimax solutions for the case where the prediction and action spaces are the simplex (this setup is sometimes called the Brier game) and the $\ell_2$ ball (this setup is related to Gaussian density estimation). We show that in both cases the value of each sub-game is a quadratic function of a simple statistic of the state, with coefficients that can be efficiently computed using an explicit recurrence relation. The resulting deterministic minimax strategy and randomized maximin strategy are linear functions of the statistic.


New Criteria and a New Algorithm for Learning in Multi-Agent Systems

Powers, Rob, Shoham, Yoav

Neural Information Processing Systems

We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly inrepeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm's payoff at least approach (and possibly exceed) the security level payoff (or maximin value),and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms.


New Criteria and a New Algorithm for Learning in Multi-Agent Systems

Powers, Rob, Shoham, Yoav

Neural Information Processing Systems

We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm's payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms.


New Criteria and a New Algorithm for Learning in Multi-Agent Systems

Powers, Rob, Shoham, Yoav

Neural Information Processing Systems

We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly in repeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm's payoff at least approach (and possibly exceed) the security level payoff (or maximin value), and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms.